Shallow Parsing By Weighted Probabilistic Sum

نویسندگان

  • Young-Sook Hwang
  • So-Young Park
  • Hoo-Jung Chung
  • Yong-Jae Kwak
  • Hae-Chang Rim
چکیده

In this paper, we define the chunking problem as a classification of words and present a weighted probabilistic model for a text chunking. The proposed model exploits context features around the focus word. And to alleviate the sparse data problem, it integrates general features with specific features. In the training stage, we select useful features after measuring information gain ratio of each features and assign higher weight to more informative feature by adopting the information gain ratio. At the application time, we classify words into chunk labels while checking consistency of the begin and the end of a chunk. The experimental results show that the model combining general and specific features alleviates the sparse data problem. In addition, the weighted probabilistic model based on information gain ratio outperforms the non-weighted model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Language-Independent Shallow-Parser Compiler

We present a rule−based shallow− parser compiler, which allows to generate a robust shallow−parser for any language, even in the absence of training data, by resorting to a very limited number of rules which aim at identifying constituent boundaries. We contrast our approach to other approaches used for shallow−parsing (i.e. finite−state and probabilistic methods). We present an evaluation of o...

متن کامل

IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES Confidence Estimation in Structured Prediction

Structured classification tasks such as sequence labeling and dependency parsing have seen much interest by the Natural Language Processing and the machine learning communities. Several online learning algorithms were adapted for structured tasks such as Perceptron, PassiveAggressive and the recently introduced Confidence-Weighted learning . These online algorithms are easy to implement, fast t...

متن کامل

The Effect of Rhythm on Structural Disambiguation in Chinese

The length of a constituent (number of syllables in a word or number of words in a phrase), or rhythm, plays an important role in Chinese syntax. This paper systematically surveys the distribution of rhythm in constructions in Chinese from the statistical data acquired from a shallow tree bank. Based on our survey, we then used the rhythm feature in a practical shallow parsing task by using rhy...

متن کامل

Integration of supra-lexical linguistic models with speech recognition using shallow parsing and finite state transducers

This paper proposes a layered Finite State Transducer (FST) framework integrating hierarchical supra-lexical linguistic knowledge into speech recognition based on shallow parsing. The shallow parsing grammar is derived directly from the full fledged grammar for natural language understanding, and augmented with top-level n-gram probabilities and phrase-level context-dependent probabilities, whi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001